Querying multilevel annotation and alignment for detecting grammatical valence divergencies

نویسنده

  • Oliver Čulo
چکیده

The valence concept has been used in machine translation as well as didactics on order to build up valence dictionaries for the respective uses. Most valence dictionaries have been built up manually, but given the growing number of parallel resources, it would be desirable to automatically exploit them as basis for building up bilingual valence dictionaries. The present contribution reports on a pilot study on a German-English parallel corpus. In this study, patterns of verb plus grammatical functions were extracted from parallel sentences. The paper reports on some of the basic findings of this extraction, regarding divergencies both in valence patterns as well as syntactic realisations of the predicate, i.e. the verb. These findings set the agenda for further research, which should focus on how to detect semantic shifts of valence carriers in translation and how this affects valence.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discontinuous Constituents: a Problematic Case for Parallel Corpora Annotation and Querying

In this paper, we discuss some linguistic phenomena that pose potential problems for multilevel linguistic annotation of parallel corpora in general and specifically for data encoding with state-of-art multilevel corpus querying tools such as CQP. We describe the strategy we use for integrating the standard hierarchical XML representation used to annotate such phenomena in our aligned bilingual...

متن کامل

Semi-Automatic Phonological Annotations of Speech by Grammatical Inference

This paper describes a technique for automatically generating multiple levels of linguistic annotation for a corpus of speech utterances. Using a training corpus of multilevel annotations, a corresponding finite-state representation is automatically constructed by grammatical inference. This finite-state description is then employed as a knowledge component to automatically generate a new multi...

متن کامل

Detecting Grammatical Errors in Machine Translation Output Using Dependency Parsing and Treebank Querying

Despite the recent advances in the field of machine translation (MT), MT systems cannot guarantee that the sentences they produce will be fluent and coherent in both syntax and semantics. Detecting and highlighting errors in machine-translated sentences can help post-editors to focus on the erroneous fragments that need to be corrected. This paper presents two methods for detecting grammatical ...

متن کامل

Robust clause boundary identification for corpus annotation

The paper describes a rule-based system for tagging clause boundaries, implemented for annotating the Estonian Reference Corpus of the University of Tartu, a collection of written texts containing ca 245 million running words and available for querying via Keeleveeb language portal. The system needs information about parts of speech and grammatical categories coded in the word-forms, i.e. it ta...

متن کامل

Consistency Checking for Treebank Alignment

This paper explores ways to detect errors in aligned corpora, using very little technology. In the first method, applicable to any aligned corpus, we consider alignment as a string-to-string mapping. Treating the target string as a label, we examine each source string to find inconsistencies in alignment. Despite setting up the problem on a par with grammatical annotation, we demonstrate crucia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012